Learning and Selecting the Right Customers for Reliability: A Multi-armed Bandit Approach
نویسندگان
چکیده
In this paper, we consider residential demandresponse (DR) programs where an aggregator calls upon someresidential customers to change their demand so that the totalload adjustment is as close to a target value as possible.Major challenges lie in the uncertainty and randomness of thecustomer behaviors in response to DR signals, and the limitedknowledge available to the aggregator of the customers. To learnand select the right customers, we formulate the DR problemas a combinatorial multi-armed bandit (CMAB) problem witha reliability goal. We propose a learning algorithm: CUCB-Avg (Combinatorial Upper Confidence Bound-Average), whichutilizes both upper confidence bounds and sample averagesto balance the tradeoff between exploration (learning) andexploitation (selecting). We prove that CUCB-Avg achievesO(log T ) regret given a time-invariant target, and o(T ) regretwhen the target is time-varying. Simulation results demonstratethat our CUCB-Avg performs significantly better than theclassic algorithm CUCB (Combinatorial Upper ConfidenceBound) in both time-invariant and time-varying scenarios.
منابع مشابه
A Bayesian Bandit Approach to Personalized Online Coupon Recommendations
A digital coupon distributing firm selects coupons from its coupon pool and posts them online for its customers to activate them. Its objective is to maximize the total number of clicks that activate the coupons by sequential arriving customers. This paper resolves this problem by using a multi-armed bandit approach to balance the exploration (learning customers' preference for coupons) with ex...
متن کاملA quality assuring multi-armed bandit crowdsourcing mechanism with incentive compatible learning
We develop a novel multi-armed bandit (MAB) mechanism for the problem of selecting a subset of crowd workers to achieve an assured accuracy for each binary labelling task in a cost optimal way. This problem is challenging because workers have unknown qualities and strategic costs.
متن کاملBudgeted Learning, Part I: The Multi-Armed Bandit Case
We introduce and motivate the task of learning under a budget. We focus on a basic problem in this space: selecting the optimal bandit after a period of experimentation in a multi-armed bandit setting, where each experiment is costly, our total costs cannot exceed a fixed pre-specified budget, and there is no reward collection during the learning period. We address the computational complexity ...
متن کاملAn Optimal Online Method of Selecting Source Policies for Reinforcement Learning
Transfer learning significantly accelerates the reinforcement learning process by exploiting relevant knowledge from previous experiences. The problem of optimally selecting source policies during the learning process is of great importance yet challenging. There has been little theoretical analysis of this problem. In this paper, we develop an optimal online method to select source policies fo...
متن کاملSequential Transfer in Multi-armed Bandit with Finite Set of Models
Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly improve the learning performance, most of the literature on transfer is focused on batch learning tasks. In this paper we study the problem of sequential transfer...
متن کامل